canonical correlation
Insights on representational similarity in neural networks with canonical correlation
Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method (Raghu et al, 2017). We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottom-up pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations.
Measuring Teaching with LLMs
Objective and scalable measurement of teaching quality is a persistent challenge in education. While Large Language Models (LLMs) offer potential, general-purpose models have struggled to reliably apply complex, authentic classroom observation instruments. This paper uses custom LLMs built on sentence-level embeddings, an architecture better suited for the long-form, interpretive nature of classroom transcripts than conventional subword tokenization. We systematically evaluate five different sentence embeddings under a data-efficient training regime designed to prevent overfitting. Our results demonstrate that these specialized models can achieve human-level and even super-human performance with expert human ratings above 0.65 and surpassing the average human-human rater correlation. Further, through analysis of annotation context windows, we find that more advanced models-those better aligned with human judgments-attribute a larger share of score variation to lesson-level features rather than isolated utterances, challenging the sufficiency of single-turn annotation paradigms. Finally, to assess external validity, we find that aggregate model scores align with teacher value-added measures, indicating they are capturing features relevant to student learning. However, this trend does not hold at the individual item level, suggesting that while the models learn useful signals, they have not yet achieved full generalization. This work establishes a viable and powerful new methodology for AI-driven instructional measurement, offering a path toward providing scalable, reliable, and valid feedback for educator development.
Graph Canonical Correlation Analysis
Park, Hongju, Bai, Shuyang, Ye, Zhenyao, Lee, Hwiyoung, Ma, Tianzhou, Chen, Shuo
CCA considers the following maximization problem: max a,b(a X Y b) subject to a X X a 1 and b Y Y b 1, where the vectors a and b and the correlation are said to be canonical vectors and canonical correlation if they attain the above maximization. In the classical canonical correlation analysis, the canonical vectors a and b include nonzero loadings for all X and Y variables. However, in a high-dimensional setting with p, q n, the goal is to identify which subsets of X are associated with subsets Y and estimate the measure of associations, as the canonical correlation with the full dataset is overly high due to estimation bias caused by overfitting. To ensure the sparsity, shrinkage methods 4 Biometrics, 000 0000 are commonly used. For example, Witten et al. (2009) propose sparse canonical correlation analysis (sCCA). The criterion of sCCA can be in general expressed as follows: max a,b a X Y b subject to a X X a 1, b Y Y b 1, P 1( a) k 1, P 2( b) k 2, where P 1 and P 2 are convex penalty functions for penalization for a and b with positive constants k 1 and k 2, respectively. A representative penalty function is a ℓ 1 penalty function such that P 1(a) = a 1 and P 2(b) = b 1. sCCA imposes zero loadings in canonical vectors and thus only selects subsets of correlated X and Y . However, sCCA methods may neither fully recover correlated X and Y pairs nor capture the multivariate-to-multivariate linkage patterns (see Figure 3) because the ℓ 1 shrinkage tends to select only a small subset from the associated variables of X and Y .
A Shape-Based Functional Index for Objective Assessment of Pediatric Motor Function
Kumar, Shashwat, Rahman, Arafat, Gutierrez, Robert, Livermon, Sarah, McCrady, Allison N., Blemker, Silvia, Scharf, Rebecca, Srivastava, Anuj, Barnes, Laura E.
Clinical assessments for neuromuscular disorders, such as Spinal Muscular Atrophy (SMA) and Duchenne Muscular Dystrophy (DMD), continue to rely on subjective measures to monitor treatment response and disease progression. We introduce a novel method using wearable sensors to objectively assess motor function during daily activities in 19 patients with DMD, 9 with SMA, and 13 age-matched controls. Pediatric movement data is complex due to confounding factors such as limb length variations in growing children and variability in movement speed. Our approach uses Shape-based Principal Component Analysis to align movement trajectories and identify distinct kinematic patterns, including variations in motion speed and asymmetry. Both DMD and SMA cohorts have individuals with motor function on par with healthy controls. Notably, patients with SMA showed greater activation of the motion asymmetry pattern. We further combined projections on these principal components with partial least squares (PLS) to identify a covariation mode with a canonical correlation of r = 0.78 (95% CI: [0.34, 0.94]) with muscle fat infiltration, the Brooke score (a motor function score), and age-related degenerative changes, proposing a novel motor function index. This data-driven method can be deployed in home settings, enabling better longitudinal tracking of treatment efficacy for children with neuromuscular disorders.
Canonical Correlation Guided Deep Neural Network
Chen, Zhiwen, Mo, Siwen, Ke, Haobin, Ding, Steven X., Jiang, Zhaohui, Yang, Chunhua, Gui, Weihua
Learning representations of two views of data such that the resulting representations are highly linearly correlated is appealing in machine learning. In this paper, we present a canonical correlation guided learning framework, which allows to be realized by deep neural networks (CCDNN), to learn such a correlated representation. It is also a novel merging of multivariate analysis (MVA) and machine learning, which can be viewed as transforming MVA into end-to-end architectures with the aid of neural networks. Unlike the linear canonical correlation analysis (CCA), kernel CCA and deep CCA, in the proposed method, the optimization formulation is not restricted to maximize correlation, instead we make canonical correlation as a constraint, which preserves the correlated representation learning ability and focuses more on the engineering tasks endowed by optimization formulation, such as reconstruction, classification and prediction. Furthermore, to reduce the redundancy induced by correlation, a redundancy filter is designed. We illustrate the performance of CCDNN on various tasks. In experiments on MNIST dataset, the results show that CCDNN has better reconstruction performance in terms of mean squared error and mean absolute error than DCCA and DCCAE. Also, we present the application of the proposed network to industrial fault diagnosis and remaining useful life cases for the classification and prediction tasks accordingly. The proposed method demonstrates superior performance in both tasks when compared to existing methods. Extension of CCDNN to much more deeper with the aid of residual connection is also presented in appendix.
A prototype-based model for set classification
Mohammadi, Mohammad, Ghosh, Sreejita
Classification of sets of inputs (e.g., images and texts) is an active area of research within both computer vision (CV) and natural language processing (NLP). A common way to represent a set of vectors is to model them as linear subspaces. In this contribution, we present a prototype-based approach for learning on the manifold formed from such linear subspaces, the Grassmann manifold. Our proposed method learns a set of subspace prototypes capturing the representative characteristics of classes and a set of relevance factors automating the selection of the dimensionality of the subspaces. This leads to a transparent classifier model which presents the computed impact of each input vector on its decision. Through experiments on benchmark image and text datasets, we have demonstrated the efficiency of our proposed classifier, compared to the transformer-based models in terms of not only performance and explainability but also computational resource requirements.
The Randomized Dependence Coefficient
We introduce the Randomized Dependence Coefficient (RDC), a measure of nonlinear dependence between random variables of arbitrary dimension based on the Hirschfeld-Gebelein-Rényi Maximum Correlation Coefficient. RDC is defined in terms of correlation of random non-linear copula projections; it is invariant with respect to marginal distribution transformations, has low computational cost and is easy to implement: just five lines of R code, included at the end of the paper.
Efficient Algorithms for the CCA Family: Unconstrained Objectives with Unbiased Gradients
Chapman, James, Wells, Lennie, Aguila, Ana Lawry
The Canonical Correlation Analysis (CCA) family of methods is foundational in multi-view learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. These methods show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. This speed allows us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 variants. Finally, we not only match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, but also establish the first solid theoretical links to classical CCA, laying the groundwork for future insights.
Inferring independent sets of Gaussian variables after thresholding correlations
Saha, Arkajyoti, Witten, Daniela, Bien, Jacob
We consider testing whether a set of Gaussian variables, selected from the data, is independent of the remaining variables. We assume that this set is selected via a very simple approach that is commonly used across scientific disciplines: we select a set of variables for which the correlation with all variables outside the set falls below some threshold. Unlike other settings in selective inference, failure to account for the selection step leads, in this setting, to excessively conservative (as opposed to anti-conservative) results. Our proposed test properly accounts for the fact that the set of variables is selected from the data, and thus is not overly conservative. To develop our test, we condition on the event that the selection resulted in the set of variables in question. To achieve computational tractability, we develop a new characterization of the conditioning event in terms of the canonical correlation between the groups of random variables. In simulation studies and in the analysis of gene co-expression networks, we show that our approach has much higher power than a ``naive'' approach that ignores the effect of selection.
Day 176(Computer Vision) -- Age-Invariant Face Recognition
Feature Factorization -- A linear factorization module is introduced that decomposes the entire set of facial features into two sets of uncorrelated components(age & identity). This is followed by retrieving age-related details through a mapping function'R' and the residual part is considered as the identity component. During the inference time, only the identity-related features are utilised for face recognition. The first backbone network is similar to ResNets which extracts the initial features from the entire image. Decorrelated Adversarial Learning -- Even though we want both the components to be independent of each other, practically identity has some mix of features from the age information.